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Abstract 

Heteroscedastic two-way ANOVA models are frequently encountered in experimental sciences, e.g., biology, psychology and 
physics among others. In the literature, classical F-tests are often blindly employed although they are often biased even 
for moderate heteroscedasticity. To overcome this problem, several approximate tests, e.g., generalized P-values, bootstrap, 
and permutation-based tests among others, have been proposed in the literature. These tests, however, are computationally 
intensive or do not work well in terms of size controlling and power. In this paper, we study tests of linear hypotheses in 
heteroscedastic two-way ANOVA via proposing a modified Bartlett (MB) test and a parametric bootstrap (PB) test. The MB 
test is easy to implement via using the usual y 2 -distribution and it is shown to be invariant under affine transformations, 
different choices of the contrast matrix used to define the same hypothesis and different labelling schemes of the cell means. 
The PB test, on the other hand, is easy to implement but it is time-consuming. Simulations and real data applications show 
that the MB test and the PB test are comparable and they both outperform the classical F-test in terms of size controlling and 
power for various sample sizes and parameter configurations. 
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1 Introduction 


A two-way analysis of variance (ANOVA) aims to compare the effects of several levels of two factors in a factorial experiment 
with two-way layout. It is widely used in experimental sciences, e.g., biology, psychology, physics, among others. There is a vast 
literature concerned with two-way ANOVA using classical F-tests. A good survey is given by Fujikoshi (1993). The classical F-test 
works well when the cell variances are homogeneous. However, when this homogeneity assumption is not satisfied, the classical 
F-test is either too conservative (the empirical size is much smaller than the nominal one) or too liberal (the empirical size is much 
larger than the nominal one) as demonstrated in Table 2. 

In real data analysis, the homogeneity assumption for cell variances is often difficult to check and is often not satisfied. If this 
is the case, two-way ANOVA is usually referred to as heteroscedastic two-way ANOVA. An application of the classical F-test 
to the main and interaction effect tests of heteroscedastic two-way ANOVA may lead to misleading conclusions (Wilcox 1989, 
Krutchkoff 1989, Ananda and Weerahandi 1997). To solve this problem, several approximation tests have been proposed. Wilcox 
(1989) proposed two such tests. The first one generalized lames (1954)'s second order test. Since James's second order test has 
an extremely complicated form, so is Wilcox's version. Wilcox's (1989) second test is not invariant with respect to permutation 
and it has a low power according to the simulation results presented in Wilcox (1989). Krutchkoff (1989) proposed a simulation- 
based approximate test. Ananda and Weerahandi (1997) proposed another simulation-based test using the concept of generalized 
P-values, resulting in a so-called generalized F-test. Recently, some more simulation-based approaches have been proposed for 
ANOVA models. Bao and Ananda (2001) and Mu et al. (2008) gave further studies on the generalized P-values method. Xu et al. 
(2013) proposed a parametric bootstrap test and compared it with the generalized F-test of Ananda and Weerahandi (1997). More 
recently, Konietschke et al. (2015) proposed a bootstrap method while Pauly et al. (2015) studied a permutation test for general 
ANOVA problems. 

Note that the above simulation-based tests are generally computationally intensive or do not work well in terms of size controlling 
and power. Therefore, it is still of interest to propose and study some simple and accurate tests for the main and interaction effect 
tests of heteroscedastic ANOVA or MANOVA. Much work has been done in this direction. Examples include Zhang (2012a,b) who 
studied an approximate degrees of freedom F-test for heteroscedastic two-way ANOVA and one-way MANOVA respectively, and 
Harrar and Bathke (2010) who proposed modified Wilks likelihood ratio, Lawley-Hotelling trace and Bartlett-Nanda-Pillai tests 
for heteroscedastic two-way MANOVA among others. Zhang (2011) noted that Harrar and Bathke's modified MANOVA tests 
are not affine invariant. To address this problem, he proposed an approximate Hotelling T 2 test using Wishart-approximation. 
Harrar and Bathke (2010)'s modified MANOVA tests were further improved by Zhang and Xiao (2012). 

In this paper, alternatively, we propose and study two new tests, a modified Bartlett (MB) test and a parametric bootstrap (PB) 
test for the general linear hypothesis testing (GLHT) problem in heteroscedastic two-way ANOVA which includes the main. 
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interaction effect, post hoc, and contrast tests among others as special cases. The MB test is obtained via an application of the 
modified Bartlett correction of Fujikoshi (2000) to a Wald-type statistic constructed for the GLHT problem. The MB test is easy to 
implement via the usual \ 2 -distribution and it is invariant under affine transformations, different choices of the contrast matrix 
used to define the same hypothesis and different labelling schemes of the cell means. The PB test, on the other hand, is easy to 
implement but it is time-consuming. Simulation studies and real data applications presented in this paper demonstrate that (1) 
the PB and MB tests are comparable in terms of size controlling and power but the MB test requires much less computational work 
than the PB test; and (2) the classical F test is very liberal or conservative for heteroscedastic two-way ANOVA in many cases of 
the simulation studies presented in this paper. 

The rest of the paper is organized as follows. The methodologies for the PB and MB tests are presented in Section 2. Simulation 
results are presented in Section 3. A real data example is presented in Section 4. Finally, some technical proofs of the main results 
are given in the Appendix. 


2 Methodologies 

2. 1 Tests of Main and Interaction Effects in Heteroscedastic Two-way ANOVA 

Consider a two-way experiment with two factors A and B having a and 6 levels respectively with a total of ab factorial combina- 
tions or cells. Suppose at the (i, j)-th cell, we have a random sample: yijk, k = 1, 2, • • • , rnj satisfying the following model: 


tjijk — III, A tijki r ): j k ~ N(0, <7 r .j ) , k — 1 , • 


(1) 


where p ( , and are the associated cell mean and variance. All these ab samples are assumed to be independent with each other 
but there is no knowledge about if the cell variances ajj, i = 1, - • • , a; j = 1, • • • , b are equal to each others. 

For a two-way ANOVA model, the cell means pij are usually decomposed into the form pij = po + a ;+/?,■ +7y, i = 1,2 = 

1, 2, • • • , 6, where po is the grand mean, oa and /3j are the i-th and y-th main effects of factors A and B respectively, and 7,., is the 
(i, j)-th interaction effect between factors A and B so that (1) can be further written as the following well-known two-way ANOVA 
model: 

Uijk = p 'T rr, -\- 3j + 7 ij + tijk, f-ijk ~ v ( 0 . ay ), 
k = 1, 2, • • • , riij ; f — 1,2, = 1,2, 

For this model, interested are the following three null hypotheses: 


( 2 ) 


H oa 

Hob 

Hqab 


oil = 02 = • ' ' = Ola 5 = 0 , 

/3l = p2 = • • • = fib = 0 , 

711 = ' ' ' = 716 = ' ' ' = 7ol = ' ' ' = Tab = 0. 


(3) 


The first two null hypotheses aim to test if the main effects of the two factors are significant while the last one aims to test if the 
interaction effect between the two factors is significant. The model (2) is not identifiable since the parameters po, cm, /3j and 7 y 
are not uniquely defined unless further constraints are imposed. Given a sequence of positive weights uy , i = 1, 2, • • • , a; j = 
1, 2, • • • , b, we impose the following constraints: 


^ Wi.ai = 0, ^ w.j/3j = 0 
i= 1 i= 1 

a 

^ ^ 'Wij'lij = 0 , j = 1 , 2 , 

i = 1 
b 

= 0 , i = 1 , 2 , 

3 = 1 

a b 

EE Wijiij = 0, 


6 — 1 . 


a — 1, 


(4) 

(5) 

( 6 ) 
(7) 


i= 1 j = l 


where up = y^ _ | uy and w.j = uy . Notice that we here use only a + 6 + 1 constraints that imply the a + b + 2 

constraints suggested by Fujikoshi (1993) and adopted by Ananda and Weerahandi (1997). This is because (5) jointly with (7) 
implies that (6) holds for i = a while (6) jointly with (7) implies that (5) holds for j = b. Set a = [ay • • • , a a ] T , /3 = [/3i, • • • , /3b] T , 
and 7 = [711 , • • • , 716, • • • , 7oi , • • • , 7 a b] T ■ Then under (4)~(7), simple algebra shows that the three null hypotheses (3) can be 
equivalently written as 


( 8 ) 


Hoa : 

H a a 

= 0 , 

where H„ = 

(^Ia-l, 1 a 

Hob : 

H b (3 

= 0 , 

where Hi, = 

(l b - 1, — lfo- 

Hqab : 

H ab ~f 

= 0 , 

where H ab = 

H„ ® H b , 
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where and throughout, I r and l r denote the identity matrix of size r and the / -dimensional vector of ones respectively, and ® 
denotes the Kronecker product operation. The matrices H a , H b and H ab are full rank contrast matrices whose rows sum up to 0. 

The contrast matrices H a , H b and H ab are not unique. For example, H a = ( — l a -i, 7 a _i) is also a contrast matrix associated 
with the null hypothesis Ho a- However, we later will show that the MB and PB tests, which will be proposed in this paper, will 
not be affected by different choices of the contrast matrices. 

When the weights can be written as w b j = UiVj, i = 1,2, = 1, 2, • • • , 6 such that m > 0, m = 1, and Vj > 

0, '}2 b j=i v j = 1> we can easily identify the parameters /ro, Qi, /3j, and yij as /io = v-jV-jUg, cnj = X^ =1 VjfHj — Ho, Pj = 

^ / - i UiHij Ho, and 77 — pij ot,% fdj Ho- Let u — [i/i, * * * , Ua] , u — [i/i, * * * , r//>] and p, — [/rn, • • • , /rib, • • • , jJa i , * * * , /Tab] 

Denote the s-dimensional unit vector whose r-th component is 1 and others are 0 as e r , s . Then we further have /t 0 = [u T ® v T ]p, 
on = [(ei, a ~ u) T ® v T ]p, /3j = [u T ® (e jib - v) T ]p, and 7^ = [(e ijQ , - u) T ® (e j}b - v) T ]p, i == 1, 2, • • • , a; j = 1, 2, • • • , 6. In 
matrix notation, we can further write 

a = A a p, (3 = A b p, and 7 = A ab p , 

where A a = (I a - 1 a u T ) ® v T , A b = u T ® (I b - l b v T ), and A ab = ( I a - 1 a u T ) ® ( I b - l b v T ). 

There are a few methods which can be used to specify the weights w ,,- , i = 1, 2, • • • , a; j = 1, 2, • • • , 6; see for example, Fujikoshi 
(1993). In this paper, we shall use the following two simple methods: the equal-weight method and the size-adapted-weight 
method. Both the methods specify the weights as wp = UiVj, i = 1, 2, • • • , a; j = 1, 2, • • • , b with the equal-weight method 
specifying u and v as Ui = l/a,Vj = 1/6, i = 1,2 = 1, 2, •••, 6 while the size-adapted-weight method specifying u and 

v as ui = Xo=i n 7 / -V, i = 1, 2, • • • , a and Vj = J/“ =] riij/N,j = 1, 2, • • • , b where N — ^? =1 ]Cj= 1 n H denotes the total sample 
size. Under the equal-weight method, the matrices A a , A b and A ab have the following simple forms: 

A a = (I a - A b ^±L® (I b - ^), = (I a - ® (7b - 

a 0 a 0 a b 

where J m = 1,„1^ is a m x m matrix of ones. When the two-way ANOVA design is balanced, i.e., when all the cell sizes 
n,, , i = 1, 2, • • • , a; j fa 1, 2, • • • , 6 are the same, the size-adapted-weight method reduces to the equal-weight method. 

It is now clear that each of the testing problems associated with the three null hypotheses (8) can be equivalently expressed in the 
form of the general linear hypothesis testing (GLHT) problem (10) as described in the next subsection with C respectively being 

C a = H a A a , C b — H b A b , C ab = H ab A ab . (9) 


2.2 Modified Bartlett Test for Linear Hypotheses 


Using the cell mean vector p defined in the previous subsection, we can write the GLHT problem under the heteroscedastic 
two-way ANOVA model (2) as: 

Ho : Cp = c, vs Hi : Cp ^ c, (10) 


where C : q x (ab) is a known coefficient matrix of full rank with rank(C) = q and c : q x 1 is a known constant vector, often 
specified as 0. The above GLHT problem is very general. In fact, the contrast matrices 77 a , H b , and H ab in C a , C b , and C ab can 
be replaced by any other proper contrast or non-contrast matrices. For example, to test H 0 : ai — .5«3 — .5a7 = 0, we only need 
set c = 0 and C = jei, a — .5e3 j0 — .5e7, 0 ] A a , and to test Ho : (H 3 = 4/3s, we only need set c = 0 and C = [e 3 , b — 4eg , b \A b where A a 
and A b are as defined in the previous subsection. To construct the test statistic for (10), we denote the usual unbiased estimators 
of the cell means and cell variances of the random sample (1) as: 

n ij n ij 

flij = Tl^j ^ Dijki &ij — ( flij — 1 ) ^ (yij k ~ frij) , i = 1 , * * * , a] j = 1 , • • • , 6 . ( 11 ) 

k = 1 k = 1 


Set (l = [All , * * * , AlbK * * * ? Aal , * 

2 2 _2 2 2 

r\\c\(y( Zll. CT 12 . . . a lb . . . a a 1 J a2 . . 

oV nil J ni2 ’ 5 n lb ’ ’ n al ’ n a2 ’ 

Wald-type statistic: 


• • ,Aa&] T as the estimator of \i. 

Since CA - c - N q (C>i 

T = (Cp. - c) T (C±C T ) _1 (Cfi 


Then p, ~ N ab (p, S) where S = 
— c,CSC T ), we can construct the following 

- c), (12) 


where S = diag( " ' ' , a '~ , 

o v nn 7 7ii2 7 

re-express T as: 



n a i ’ n a 2 5 


). To construct the MB test, following Yanagihara and Yuan (2005), we 
T = z T W~ 1 z, (13) 


where z = (CSC T ) 1 ^ 2 (Cf. i — c), W = h£,H, and 77 = (CSC T ) 1 ^ 2 C. It is easy to see that under the null hypothesis, 
Z ~ N q (0, Iq). Let n m in = min “ =1 minj =1 mj and n ma x = m£ix “ =1 max^-j mj denote the minimum and maximum cell sizes 
respectively. To study the asymptotic null distribution of T, we impose the following condition: 


As n n 


oo, 


<00, i = 1,2, • • • ,a-,j = 1,2,- • 


(14) 
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This condition requires that all the cell sizes tend to infinity at about the same rate so that n m i n (CSC T ) tends to a non-singular 
matrix as n m in — > oo. We then have the following result. 


Theorem 1 Under the condition (14) and Ho, as n m m 


oo, we have T 


xl- 


The above theorem shows that T asymptotically follows a y 2 -distribution with q degrees of freedom. Similar results for some 
special C must have appeared in the literature but we here show that it is true for the GLHT problem (10). Based on Theorem 1, 
we can construct a x 2 -test. However, it is well known that a y 2 -test for (10) usually converges very slowly. In fact, from the proof of 
Theorem 1 in the Appendix, we can see that the convergence rate of T to y 2 is of O p (n~ ) . This indicates that the y 2 -test based on 

T can be very inaccurate when n m i n is small or moderate. To improve the convergence rate of T, the well-known Bartlett correction 
may be used. Recently, Fujikoshi (2000) proposed a modified Bartlett (MB) correction which can be used to improve the y 2 -test 
further. We feel that this MB correction can be generally used to improve an asymptotical y 2 -test provided that the asymptotical 
expressions for the mean and variance of the associated test statistic are available and follow some required structures. In this 
paper, we aim to illustrate how this technique can be used for (10). Let hij = (C'EC T )~ 1 ' 2 Cij, i = 1,2 ,■■■ ,a;j = 1, 2, • • • , b 
where Cij is the (ij )-th column of C. To apply the MB correction to T, we need the following result. 


Theorem 2 Under the condition (14) and Ho, as n m in — t oo, we have 

E(T) = q( 1 + + 0(n" 2 J and E(T 2 ) = q(q + 2)(1 + -^) + 


(15) 


where m = 


!ak A = ( 14 + 4 qQ" min 

<1 ’ 2 9(9 + 2 ) 


A, and A = £“ =1 £j=i 


-H-h 1 h 


/( mj — 1). In addition, we have 


(n m ax l)ab 


< A < 


- 1 


(16) 


Theorem 2 allows us to apply the MB correction to T using the log-transformation Tub = (n m i n /3i + P 2 ) log(l + T /}i ), where 
Pi = a2 2 2ai and p 2 = (q+2 2 !(l 2 2 ~l 2 2 a^) 4)Q!1 ■ By Fujikoshi (2000), we have T MB -^+ y 2 as n min -+ 00 with E(T MB ) = q + 0(n“ 2 J and 
E(T 2 b ) = q(q + 2) + On contrast, by Theorem 2, we only have E(T) = q + 0(n~f n ) and E(T 2 ) = q(q + 2) + 0(n“ 1 1 n ). 

That is, the first two moments of T M b converge to the first two moments of y 2 much faster than those of T do. It is then expected 
that T mb converges to y 2 much faster than T does (Fujikoshi 2000). 

In real data analysis, Pi and P 2 are unknown. They must be estimated from the data. Good estimators can be obtained as long as 
a good estimator of A is supplied. By Theorem 2, a natural estimator for A is: 


A = ^ ^(nij - 1) 1 
i= 1 j=l 


r -2 n 2 

a a ,• - t - 

— -h^ hij 

mj 


a b 


i = 1 3=1 


1)“ 


-^cl (CtC 7 


(17) 


The estimators di, 0 : 2 , Pi, and p 2 are then obtained accordingly. By the law of large numbers, it is classical to show that a i: j — > 
a 2 j, i = 1, 2, • • • , a; j = 1, 2, • • • , b almost surely as n m m — > 00 . It follows that as n m i n — t 00 , we have and i = 1, 2 all tend 

to 1. That is, they are ratio-consistent. Therefore, we continuously have 


Tmb — (riminPl + P2) log(l + 


as n n 


(18) 


Some simple algebra gives that n m inPi = 9( -^ 2) and n m i n /9i + P 2 = < ~ q+2 ' > ^ ■ From the proof of Theorem 2, one can easily see 

that the range of A given in (16) is also the range of A. Thus, provided n min > 2, we always have > 0 and n m i n /3i + P 2 > 0. 

This guarantees that T M b is a nonnegative and monotonically increasing function of T. 

The critical value of T MB can be specified as y 2 (l — a) for any given significance level a. We reject the null hypothesis of (10) when 
this critical value is exceeded by T M b- The MB test can also be conducted via computing the P-value using the y 2 -distribution (18) 
easily. 


2.3 Some Desirable Properties of the MB Test 

Notice that for each of the null hypotheses (8), the associated contrast matrix H a , Hb or H a b is not unique. For example, H a = 
^ — la-i, I a - 1 ^ is also a contrast matrix for the first null hypothesis in (8). It is known from Kshirsagar (1972, Ch. 5, Sec. 4) 
that for any two contrast matrices H * and JT» specifying the same null hypothesis, there is a nonsingular matrix P such that 
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H, = PH,, where the subscript * denotes one of the subscripts a, b and ab, used for the three null hypotheses in (8). By (9), 
the C-matrix associated with the contrast matrix H , can be expressed as C, = ( H,A ,). Let C* be the C-matrix associated with 
the contrast matrix H,. Then we have C* = ( H,A ») = (PH, A,) = PC,. Theorem 3 shows that the MB test is invariant to 
different choices of the contrast matrix for the same hypothesis. 

Theorem 3 The MB test is invariant when C and c in (10) are replaced with 

C = PC and c = Pc, (19) 


respectively where P is any nonsingular matrix. 

In practice, the observed cell responses are often re-centered or rescaled before any inference is conducted. Recentering and 
rescaling are special cases of the following affine transformation: 

Vijk Mjijk T k 1; 2, • * • , riij, i 1, 2, • • • , a, j 1? 2, • • • , b, (20) 

where A ^ 0 and £ are two known constants. 

Theorem 4 The MB test is invariant under the affine transformation (20). 

Finally, we have the following result. 

Theorem 5 The MB test is invariant under different labelling schemes of the cell means ptj,i = 1,2 ,■■■ ,a;j = 1, 2, • • • , b. 


2.4 Parametric Bootstrap Test for Linear Hypotheses 

The null distribution of the Wald-type statistic T (13) can also be approximated by a parametric bootstrap (PB) method. It is 

.2 y 2 

(7 . X n . . — 1 

well known that under the heteroscedastic two-way ANOVA model (1), we always have rij = ~ n ‘'L 1 » * = 1,2 ,■■■ ,a\j = 

ij % 3 

1, 2, • • • , 6. Let R = diag(rn, • • • , r u , ■ ■ ■ , r a i, ■ ■ ■ , r ab ). Then the distribution of R is known so that when £ is known, the 
distribution of W = H±H T = HY) 1 / 2 RV 1/2 H is known where H = (CSC 7 ' ) 1 ^ 2 C as defined previously and X = Y means 
X and Y have the same distribution. The PB test works via approximating the null distribution of T by the distribution of the 
bootstrap pivotal statistic T b = z T Wf 1 z where z ~ N q ( 0, I q ) and W b — RS 1 ^ H T with H = (CSC T ) _1 ^ 2 C. That is, 

it rejects the null hypothesis when P(T b > T obs ) < a where a is the given significance level and T obs is the observed value of T as 
given in (12). The left-hand side probability has to be evaluated using Monte Carlo via simulating (z, R) a large number of times. 
It is worthwhile to mention that we can show that Theorems 3, 4, and 5 also hold for the PB test. 


3 Simulation Studies 


In this section, we aim to compare the classical F, PB, and MB tests for two-way ANOVA via simulations. We implemented the 
three tests using the Matlab software. Upon an email request, readers can get the related Matlab codes from the corresponding 
author. 

We first describe how the data in a two-way ANOVA model are generated. We do not need to directly generate the raw data; 
rather we need only to generate the sample cell means and variances using the given cell sizes, population means and variances 
since all the three tests are conducted using the sample cell sizes, means and variances only. For a given cell size vector n = 
[nil, ni2, • • • , n ab \, a cell mean vector p = \pn, pi2, ■ ■ ■ , p a b] and a cell variance vector cr 2 = [<7 2 l5 cr 2 2 , • • • , rr 2 ab ], a sample cell mean 

vector p = [pii,pi2, • • • , pab] and a sample cell variance vector a 2 = [af , <j 2 2 , • • • , < t 2 6 ] can be generated as pij ~ N (ji v j , af /n ZJ ) 
a 2 

and a 2 j ~ n .‘i 1 X^,-ii i = 1,2 ,■■■ ,a;j = 1, 2, • • • , b. An illustrative example with a = b = 4 is presented in Table 1 in which the 
cell sizes, population means and standard deviations are listed in the columns associated with mj, p^, and a,, respectively while 
the generated cell means and standard deviations are listed in the columns associated with jiij and &ij respectively. 

As an example, we applied the F, PB, and MB tests to the data in Table 1 in checking if there is an interaction effect between the two 
factors. In all the computations, the size-adapted-weight method was used. The P-values of the PB and MB tests were obtained as 
55.5% and 56.1% respectively, suggesting that there is no strong evidence to reject the null hypothesis. However, the P-value for 
the classical F-test is 3.98%, suggesting a strong rejection for the null hypothesis. Therefore, in this example, the F-test produced 
a misleading conclusion while the PB and MB tests did not. This is because according to Table 1, the true cell means are all 0 so 
that the true interaction effect between the two factors is actually 0. This example shows the danger to apply the F-test blindly in 
a heteroscedastic two-way ANOVA model. In fact, this is not a single such case. The last row of Table 2 shows that at the nominal 
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TABLE 1 AN ILLUSTRATIVE EXAMPLE FOR DATA GENERATION IN A TWO-WAY ANOVA MODEL 


[hj\ 

n ij 

flij 

a ij 

fiij 


[*,j] 

riij 

f^ij 

a ij 

fiij 

3'ij 

[1,1] 

4 

0 

6 

0.404 

3.200 

[3,1] 

4 

0 

6 

-0.367 

6.693 

[1,2] 

8 

0 

1 

-0.095 

0.855 

[3,2] 

8 

0 

1 

-0.168 

1.028 

[1,3] 

12 

0 

1 

0.211 

0.764 

[3,3] 

12 

0 

1 

0.093 

0.814 

[1,4] 

20 

0 

1 

-0.074 

1.187 

[3,4] 

20 

0 

1 

0.156 

0.961 

[2,1] 

4 

0 

6 

-3.574 

2.706 

[4,1] 

4 

0 

6 

-2.254 

3.212 

[2,2] 

8 

0 

1 

-0.520 

0.857 

[4,2] 

8 

0 

1 

0.255 

0.514 

[2,3] 

12 

0 

1 

0.351 

0.706 

[4,3] 

12 

0 

1 

0.140 

0.685 

[2,4] 

20 

0 

1 

-0.102 

0.719 

[4,4] 

20 

0 

1 

0.219 

1.066 


TABLE 2 EMPIRICAL SIZES AND POWERS OF THE F, PB AND MB TESTS FOR VARIOUS CELL SIZES AND PARAMETER CONFIGURATIONS 


cr 

n 

F 

5 = 0 
PB 

MB 

F 

5 = 2 
PB 

MB 

F 

5 = 4 
PB 

MB 

F 

5 = 6 
PB 

MB 


( 11 , 11 , 11 , 11)4 

.050 

.049 

.044 

.222 

.203 

.193 

.819 

.774 

.771 

.994 

.990 

.994 

( 1 , 1 , 1 , 1)4 

( 10 , 20 , 20 , 30 % 

.050 

.049 

.051 

.374 

.342 

.354 

.973 

.962 

.967 

1.00 

1.00 

1.00 


(4,8, 12,20)4 

.053 

.049 

.049 

.184 

.145 

.133 

.700 

.569 

.562 

.985 

.937 

.944 


( 11 , 11 , 11 , 11)4 

.078 

.054 

.048 

- 

.062 

.073 

- 

.156 

.149 

- 

.365 

.342 

(1,2, 3, 4)4 

(10,20,20,30)4 

.019 

.050 

.044 

- 

.085 

.100 

- 

.304 

.308 

- 

.669 

.673 


(4,8. 12,20)4 

.005 

.041 

.049 

- 

.065 

.062 

- 

.153 

.162 

- 

.345 

.309 


( 11 , 11 , 11 , 11)4 

.106 

.037 

.046 

- 

.094 

.102 

- 

.375 

.369 

- 

.774 

.766 

(1,1, 1,4)4 

(10,20,20,30)4 

.016 

.048 

.057 

- 

.142 

.149 

- 

.535 

.555 

- 

.931 

.929 


(4,8. 12,20)4 

.001 

.044 

.049 

- 

.084 

.089 

- 

.226 

.223 

- 

.516 

.522 


( 11 , 11 , 11 , 11)4 

.130 

.044 

.047 

- 

.096 

.097 

- 

.337 

.347 

- 

.729 

.735 

( 1 , 1 , 1 , 6)4 

(10,20,20,30)4 

.015 

.050 

.054 

- 

.132 

.137 

- 

.489 

.500 

- 

.899 

.904 


(4,8, 12,20)4 

.003 

.054 

.053 

- 

.079 

.078 

- 

.224 

.218 

- 

.463 

.478 


( 11 , 11 , 11 , 11)4 

.073 

.046 

.049 

- 

.072 

.069 

- 

.152 

.154 

- 

.360 

.342 

(4, 3, 2. 1)4 

(10,20,20,30)4 

.249 

.052 

.049 

- 

.084 

.086 

- 

.238 

.251 

- 

.561 

.571 


(4,8. 12,20)4 

.435 

.037 

.051 

- 

.056 

.062 

- 

.086 

.105 

- 

.164 

.199 


( 11 , 11 , 11 , 11)4 

.116 

.047 

.044 

- 

.103 

.104 

- 

.362 

.370 

- 

.767 

.771 

(4, 1,1, 1)4 

(10,20,20,30% 

.423 

.057 

.049 

- 

.188 

.193 

- 

.750 

.751 

- 

.992 

.993 


(4,8, 12,20)4 

.566 

.043 

.054 

- 

.074 

.086 

- 

.239 

.280 

- 

.605 

.677 


( 11 , 11 , 11 , 11)4 

.132 

.050 

.040 

- 

.110 

.103 

- 

.344 

.347 

- 

.731 

.742 

( 6 , 1 , 1 , 1)4 

(10,20,20,30)4 

.454 

.049 

.048 

- 

.178 

.188 

- 

.737 

.747 

- 

.987 

.990 


(4,8. 12,20)4 

.622 

.040 

.058 

- 

.068 

.098 

- 

.221 

.273 

- 

.578 

.655 


size a = 5%, in 3000 replications, using the same cell sizes, population cell means and standard deviations as listed in Table 1, the 
empirical sizes of the F, PB and MB tests are 62.2%, 4.0% and 5.8% respectively. That is, the times of the null hypothesis rejected 
by the F, PB and MB tests among 3000 runs are approximately 1866, 120, and 174 respectively. 

To further evaluate the performance of the F, PB and MB tests, in Table 2, we list the empirical sizes and powers of the three tests for 
testing if the interaction effect of the two factors is significant for more cell sizes and parameter configurations. For this purpose, 
we set Hij = ij5/(ab),i = 1, 2, • • • , a; j = 1, 2, • • • , b. This implies that only when 5 = 0, the null hypothesis is satisfied. The cell 
standard deviations and sizes are listed in the first two columns where we use w r to denote the vector obtained via repeating w 
for r times, e.g., (1, 3)2 = (1, 3, 1, 3). The empirical sizes of the three tests are listed in the three columns associated with 5 = 0, and 
the empirical powers are listed in the columns associated with 5 = 2,4 and 6. All the empirical sizes and powers were computed 
based on 3000 replications and for the PB test, the bootstrapped P-values were computed using 1000 simulation runs. To get more 
accurate bootstrapped P-values, more simulation runs for the PB test were preferred but the required computational effort forced 
us not to implement it in this simulation study. 

From Table 2, we first notice that both the PB and MB tests performed well and they were generally comparable in terms of size 
controlling and power. We then compare the PB and MB tests against the F test in more details. As expected, from Table 2, it is 
seen that for those cell variance homogeneity cases, the F test outperformed the PB and MB tests in terms of size controlling and 
power although the PB and MB tests did not lose too much power. This is not a surprise since the F test takes the cell variance 
homogeneity assumption into account while the PB and MB tests do not. However, for those cell variance heteroscedasticity cases, 
the PB and MB tests performed much better than the F test in terms of size controlling. The F-test is either too conservative (with 
empirical size as small as .1%) or too liberal (with empirical size as large as 62.2%). The F-test is not acceptable for heteroscedastic 
two-way ANOVA since this may frequently yield misleading conclusions in practice as in the example shown in Table 1. Since for 
these cell variance heteroscedasticity cases, the empirical sizes of the F test are very different from those of the PB and MB tests, it 
does not make too much sense to compare the powers of the F-test with those of the PB and MB tests. That is why the powers of 
the F-test are replaced with in Table 2. 
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TABLE 3 P-VALUES (x 100) OF THE F, PB AND MB TESTS FOR TESTING THE MAIN AND INTERACTION EFFECTS OF AGE AND SEX FOR 
THE BASSIN ANTICIPATORY TIMING DATA 


data used 
up to Trial 

F 

age 

PB 

MB 

F 

sex 

PB 

MB 

F 

age x sex 
PB 

MB 

1 

85.1 

84.3 

84.3 

82.6 

80.3 

80.5 

25.7 

18.3 

17.9 

2 

54.7 

50.9 

51.2 

99.9 

99.9 

99.9 

8.21 

3.04 

2.98 

3 

51.6 

48.2 

48.7 

72.5 

69.0 

69.8 

8.14 

3.59 

3.40 

4 

29.0 

26.7 

26.3 

58.6 

55.2 

54.8 

1.84 

0.52 

0.50 

5 

16.9 

14.9 

14.8 

32.1 

27.7 

27.4 

0.72 

0.13 

0.15 


From these two simulation studies, we conclude that the PB and MB tests are comparable in terms of size controlling and power 
but they generally outperform the F test in terms of size controlling. In view of the much computational effort required by the 
PB test and the difficulty for checking the equality of cell variances, the MB test should be considered as a preferred choice in 
two-way ANOVA regardless if the cell variances are equal or unequal. 


4 An Example 


In this section, for illustration and comparison, we applied the classical F, PB and MB tests to a real data set known as the bassin 
anticipatory timing data, collected for a study about the reaction times of human beings and their hand-eye coordinations. There 
are 113 subjects who visited the Human Systems Integration Laboratory, Monterey, California, on 21 October 1995. Each of the 
subjects participated in an experiment in which the subject anticipated the arrival of a light to a designated point and a light 
beam was broken. The difference between the arrival time of the light and the time the light beam was broken was recorded. 
The experiment was administered for 5 trials. For each subject, his/her age and sex were also recorded. For the details of the 
experiment and the data set, readers are referred to http://www.statsci.org/data/general/bassin.html. Figure 1 
displays the scatter plot of the anticipatory times of the subjects against their ages. It is seen that the anticipatory times of the 
subjects with ages below 20 spread much more widely than those of the subjects with ages above 20. It is then natural to group 
the ages into two categories: "age below 20" and "age above 20" so that the bassin anticipatory timing data can be treated under 
a heteroscedastic two-way ANOVA model where the two factors are "age" and "sex", each with 2 levels. 

Table 3 displays the P-values of the F, PB, and MB tests applied to test the main and interaction effects for age and sex. Five cases of 



FIG. 1 THE SCATTER PLOT OF THE ANTICIPATORY TIMES OF THE SUBJECTS AGAINST THEIR AGES 
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the cell sizes were considered. In the fc-th case for k = 1, 2, 3, 4, and 5, only the data up to Trial k are involved. Different amount of 
the data involved allows us to compare the performance of the three tests at various cell sizes. For good accuracy, the P-values of 
the PB test were computed based on 10000 simulation runs. It is seen that for testing the main effects, the conclusions drawn from 
the three tests for five different sample sizes are consistent. It is then natural to compare the three tests for testing the interaction 
effect in more details. It is seen that for each of the cases considered, the P-value of the F test is much larger than those of the PB and 
MB tests, suggesting that the F-test are generally less powerful in this example at the presence of cell variance heteroscedasticity. 
It is seen that at 5% significance level, the PB and MB tests detected the significance of the interact effect between age and sex 
using the data only up to Trial 2 while the F test failed to do that until the data up to Trial 4 were involved. Therefore, from this 
example, we see that in real data analysis, the F-test has a larger risk in getting a misleading conclusion than the PB or MB tests 
do when the cell variance heteroscedasticity is present. Since the PB test is time-consuming, the MB test is generally preferred. 


5 Conclusion 


In this paper, we propose and study a MB test and a PB test for linear hypotheses in heteroscedastic two-way ANOVA in a unified 
manner. Simulation studies demonstrate that in terms of size controlling, both the tests outperform the classical F-test even for 
moderate heteroscedasticity and the classical F-test is either too conservative or too liberal when the homogeneity assumption is 
violated. Although both the tests are comparable in terms of size controlling, we recommend the MB test since it requires much 
less computational effort than the PB test does. Note that the weights used for the MB and MB tests can be specified by either the 
equal-weight method or the size-adapted-weight method. When the cell sizes are near the same, both the weight methods can be 
used. However, when the cell sizes are quite different, the size-adapted-weight method should be used so that the effect of the 
cell sizes can be taken into account. 
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APPENDIX: Technical Proofs 


Proof of Theorem 1 Under the given conditions, we have 

o-ij = cTijXnv-i/iriij - 1), * = 1,2, • • • ,a;j = 1,2, ••• ,6. (A.l) 

It follows that (£?• - = °p( n ij 3/2 ) i * = % 2, • • • , a; j = 1, 2, • • • , 6. Thus E - E = O p (n" f n /2 ). Noticing that E = 0(n“;J, 

we further have 

R=H(± E )H t = O p (nJ' 2 ), (A.2) 

where H = (CEC |T ) _1//2 C is defined in Section 2 and H = 0(n^ 2 n ). This implies that 

W = I q + H(±- E )H t = I q + R = I q + O p (n~^ 2 ). (A3) 

Theorem 1 follows by Slutsky's theorem and noticing that under Ho, z T z ~ \q- 

Proof of Theorem 2 Notice that under Ho, we have 2 ~ IV(0, I q ). Using the conditional expectation rule and some simple algebra 
leads to 

E (T) = E tr( W^ 1 ) and E(T 2 ) = 2E tr(W~ 2 ) + E tr 2 (PP _1 ). (A.4) 

From the proof of Theorem 1 and (A.2), we have that W = I q + R with R = O p (n~f^ 2 ). Then we have W 1 = ( I q + J?) -1 = 
I q — R + R? — R s + Op(n" 2 J and W~ 2 = (I q + R)~ 2 = I q — 2 R + 3 R 2 — 4 R 3 + Op(n“ 2 n ). It is easy to see from (A.2) that 
E(i?) = 0 and Etr(J2) = 0. Thus 

EtrCIV' 1 ) = q + Etr(J? 2 ) - Etr (H 3 ) + 0(n- 2 J, 

Etr(VR- 2 ) = q + 3Etr(R 2 ) - 4Etr(.R 3 ) + 0(n~ 2 J, (A.5) 

Etr 2 (W- 1 ) = q 2 + Etr 2 (Ji) + 2gEtr(.R 2 ) - 2gEtr(.R 3 ) - 2Etr(fi)tr(R 2 ) + 0(n“ 2 n ). 

To find Etr(f? 2 ), Etr 2 (ii), • • •, notice that by (A.2), we have 

a b 

*=EE hij hij Uij , (A. 6) 
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where hij is the (ij)-th column of H and Uij = 


E«) = 

Mil, Wl2, ' 




— , and E(?4 ) = 


n lA n ij- L 2 


i = 1, 


, Uab are independent and by (A.6), we have 


, for i = 1,2, j = 1,2, •••,6. By (A.l), we have E(u;y) = 0, 
• , a;j = 1, • • • , 6, where we use E (Xd/d — l) 3 = 8/d 2 . Noticing that 


Etr(ft 2 ) 

Etr(R :i ) 


Etr 2 (ii) = EU E.y =1 = 2A 

Etr(Ji)tr(J? 2 


* 2^i=i 2^j = i 


v ij ,v %3 ) 

= E“ =1 EL(e>5) 3 e(4 


_LL h-h 
ri tJ riij 


(■ nn - 1) 2 = 0(nj n ), 


(A .7) 


where A = E“=i E 


- 13 - liL hij 

n,- ,• l J L J 


/(«». 


1). Combining (A.5) and (A. 7) gives that Etr(VR *) = Q+2A+0(n m 2 n ), Etr(W 2 ) = 


<?+6A + 0(n m 2 n )> and Etr 2 (W x ) = ( 7 2 + (8g+2)A+0(n m 2 n ) -These, together with (A.4), yield that E(T) = q [l + ^-] + 0(n m 2 J 
and E(T 2 ) = q(q + 2) [l + E 2 -] + £K n m?n)' where ai = 2n ™ in A and «2 = ( - 14 ^^) min A as desired. 


To find the lower and upper bounds of A as given in (16), for i = 1, 2, • • • , a; j = 1, 2, • • • , b, set Sij = ^Ahijhfj. It is easy to see 

^2 3 

that Sij is nonnegative and it has only one nonzero eigenvalue Sij = h 1 ,h . , . It is easy to verify that EE E J= i E; = Iq so 


that Ei=i Ey=i E ~ E»=i Ey. ,i h-ijhij — tr(E i=1 Ey=i Ei) — 9 and / 9 Sij - E 


„. Therefore I q — S',, is 


i /-jj = i ‘-'o; — y '-’o — 1 

nonnegative, showing that the only nonzero eigenvalue Sij of S, :l is less than 1. It follows that A = EE E . $ij/(mj - 1) < 
q/(n m in - 1) and A = ELi Ey=i S ij/( n ij - 1) > ELi Ey=i ^ 2 /(n ma x - 1) > (? 2 /[(n max - l)a&] where we have used the fact 
that for any nonnegative numbers vi, V 2 , ■ ■ ■ , v m , we always have E"=i v i — (EE, vi) 2 /m. The theorem is then proved. 


Proof of Theorem 3 From the definition (12) of T, it is easy to see that T is invariant under the transformation (19). Then by (18), 
we only need to show that A is invariant under the transformation (19). For i = 1, 2, • • • , a; j = 1, 2, • • • , b, let cyy and cyy be the 
(ijf)-th columns of the matrices C and C = PC respectively. Then it is easy to see that c;y = Pcij, i = 1, 2, • • • , a; j = 1, 2, • • • , b. 
It follows from (17) that 


A = 


as desired. The theorem is proved. 


EE( tt " •') 1 

i= 1 1=1 

EEk-d - 1 

i= 1 3= 1 

EE^-i )- 1 


2 

ij ~T / 
ij\ 
ij 


^c}j{ csc T )- 1 e i: 


ij -cJ'jP T (PCT,C 1 P 1 )~ L Pdj 


iT n T,-l , 


i = 1 3=1 


^cJ j {CtC T )- 1 dj 


= A, 


Proof of Theorem 4 This theorem is proved if we can show that both T and A are invariant under the affine transformation 
(20). Let Hij, tj 2 j and p,ij,S 2 j respectively denote the cell mean and variance of ytjk,k = 1,2,---, n,, before and after the affine 
transformation (20). Then we have Hij = \Hij + £ and afj = A 2 <jT. It follows that Hij = A~ 1 (nij — £)• As we define the mean 
vector h and the variance matrix £ in Section 2, we define h and S similarly. Then we have h = A h + £ and S = A 2 S, where 
£ = £lfc. It follows that the GLHT problem (10) can be equivalently expressed as Ho ■ C jj, = c, vs Hi : Cfi ^ c, where 
C = A~ X C and c = A _1 C^ + c. 


r, 

Let Hij, Sij and Hij, Sij respectively denote the unbiased estimators of the cell mean and variance of ytjk, k = 1, 2, • • • , mj before 
and after the affine transformation (20). Then it is easy to see that /L i? = A Hij + / and a.ij = A 2 S 2 j. It follows that jl = \fi + £ and 

S = A 2 S. Using the above, we have Cp,-c = A- 1 C(A/i + £) - (A _1 Ci A c) = C h - c and CSC T = A- 1 C(A 2 S)(A~ 1 C) T = 
CT,C t . Thus, both Cfi — c and C£C T are affine-invariant. So is T. 


We now turn to show that A is invariant under the affine transformation (20). Let A be the affine-transformed A. Then by (17), 
we have 


A 


i= 1 1 = 1 
a b 

EE ( n; i _ x ) 

i=i i=i 
a b 

EE^- 1 )' 


-Hcfj(ctc T y 1 i 


A 2 — A- 1 c^(CEC J ) _1 A 


T\- lx-1 


i= 1 3 = 1 


-2 

ij T f A — 1 _ 


= A, 


14 



Statistics Research Letters (SRL), Volume 5, 2016 


www. srl-journal.org 


as desired. The theorem is then proved. 

Proof of Theorem 5 The theorem is proved if we can show that both T and A are invariant under different labelling schemes of the 
means /Mj, i = 1,2 ,■■■ ,a;j = 1, 2, • • • , b. Let • • • , i a andji, j 2 , ■■■ ,jb be any permutations of 1, 2, • • • , a and 1,2 respec- 
tively. Then it is easy to see that C/1 = E t “ =1 ELi Wi/W = E“=i ELi c '.J- A* , i 


and CEC T = E'.' a E = 


E a —1 ~ 2 


Ci„ 


„ c' iu j v , showing that Cfi and CSC are invariant under different labelling schemes of the cell means 


and so is T. 

The invariance of A under different labelling schemes of the population means follows from (17) directly by noticing that 


A = ^2^2(rnj - 1) 


i=l j=l 


-2 
a ij T 


4-(csc J 




U = 1 j = 1 


cr, 

n i u j 


^ T. 


,(CSC 5 


and that CSC T is invariant under different labelling schemes of the cell means. This completes the proof of the proposition. 
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